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Abstract: We propose a comprehensive framework for additive regression 
models for non-Gaussian functional responses, allowing for multiple (par¬ 
tially) nested or crossed functional random effects with flexible correlation 
structures for, e.g., spatial, temporal, or longitudinal functional data as well 
as linear and nonlinear effects of functional and scalar covariates that may 
vary smoothly over the index of the functional response. Our implementa¬ 
tion handles functional responses from any exponential family distribution 
as well as many others like Beta- or scaled and shifted ^-distributions. De¬ 
velopment is motivated by and evaluated on an application to large-scale 
longitudinal feeding records of pigs. Results in extensive simulation studies 
as well as replications of two previously published simulation studies for 
generalized functional mixed models demonstrate the good performance of 
our proposal. The approach is implemented in well-documented open source 
software in the pffr function in R-package refund. 


1. Introduction 

Data sets in which measurements consist of curves or images instead of scalars 

i.e., functional data - are becoming ever more common in many areas of 
application. This is due to the increasing affordability and deployment of sensors 
like accelerometers or spectroscopes, high-throughput imaging technologies and 
automated logging equipment that continuously records conditions over time. 
Recent methodological development in this area has been rapid and intense, see 
Morris [13] for a review of the state of the art for regression models for functional 
data. 

In this work, we extend the general framework for functional additive mixed 
models for potentially correlated functional Gaussian responses described in 
Scheipl et al. [19] to non-Gaussian functional responses. The development is 
motivated by and evaluated on an animal husbandry dataset in which the feed¬ 
ing behavior of growing-finishing pigs was monitored continuously over 3 months 
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[11, 3]. These are non-Gaussian functional data in the sense that the underly¬ 
ing probability of feeding is assumed to be a continuous function over time, 
while the available data are sequences of binary indicators (’’feeding: yes/no”) 
evaluated at the temporal resolution of the sensors. Aggregating these binary in¬ 
dicators over given time intervals, we get time series of counts or proportions for 
which (truncated) Poisson, Negative Binomial, Binomial or Beta distributional 
assumptions could be appropriate. Another example of non-Gaussian functional 
data with large practical relevance would be continuously valued functional re¬ 
sponses with heavy-tailed measurement errors for which a scaled f-distribution 
could be appropriate. 

We briefly summarize the most relevant prior work on regression models for 
non-Gaussian functional responses. The fundamental work of Hall et al. [6] re¬ 
lates observed functional binary or count data to a latent Gaussian process (GP) 
through a link function. This is the underlying idea of almost all the works that 
follow. They differ primarily in 1) how the latent functional processes are rep¬ 
resented (i.e., either spline, wavelet or functional principal component (FPC) 
representations or full GP models), 2) to what extent additional covariate infor¬ 
mation can be included, 3) whether they allow the modelling of dependencies 
between and along the functional responses, 4) which distributions are available 
for the responses, 5) whether the functional data has to be available on a joint 
regular grid, and 6) in the availability of documented and performant software 
implementations. In Hall et al. [6], the latent GP is represented in terms of its 
FPCs and neither correlated responses nor covariates are accommodated. No 
implementation is publicly available. A Bayesian variant of Hall et al. [6] is pro¬ 
vided by van der Linde [25]. Zhu et al. [32] describe a robustified wavelet-based 
functional mixed model with scalar covariates for error-contaminated continuous 
functional responses on regular grids. No implementation was publicly available 
at the time of writing. Serban et al. [20] extend the approach of Hall et al. [6] to 
multilevel binary data without covariates and provide a rudimentary problem- 
specific implementation. Wang and Shi [26] describe an empirical Bayesian ap¬ 
proach for latent Gaussian process regression models for data from exponential 
families with scalar and concurrent functional covariates. The available imple¬ 
mentation does not accommodate covariate effects and is limited to binary data 
with curve-specific i. i. d. functional random effects, see Section 4.3 for a sys¬ 
tematic comparison with our proposal based on a replication of their simulation 
study. Li et al. [9] present a model for concurrent binary and continuous func¬ 
tional observations. Association between the two is modeled via cross-correlated 
(latent) FPC scores and cannot take into account any other covariate effects or 
dependency structures. A similar approach is described in Tidemann-Miller [24, 
ch. 3]. Goldsmith et al. [4] develop a fully Bayesian approach which uses latent 
FPCs in a spline basis representation to represent multilevel functional ran¬ 
dom effects and linear functional effects of scalar covariates. They provide an 
implementation for binary outcomes with a logit link function, see Section 4.4 
for a systematic comparison with our proposal based on a replication of their 
simulation study. Gertheiss et al. [3] describe a marginal GEE-type approach 
for correlated binary functional responses with concurrent functional covariates. 
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Brockhaus et al. [2] develop a framework for estimating flexible functional re¬ 
gression models via boosting with similar flexibility as ours, implemented in R 
[15] package FD boost [1], The package additionally implements functional quan¬ 
tile regression models, which are not included in the framework we present here. 
Since their implementation is based on a component-wise gradient boosting al¬ 
gorithm, they cannot provide hypothesis tests or resampling-free construction of 
confidence intervals, however, and also have to rely on computationally intensive 
resampling methods for hyperparameter tuning. 

Compared to previous work, the novel contribution of this work is the develop¬ 
ment, implementation and evaluation of a comprehensive maximum likelihood- 
based inferential framework for generalized functional additive mixed models 
(GFAMMs) for potentially correlated functional responses. Our proposal accom¬ 
modates diverse latent-scale correlation structures as well as flexible modeling 
of the conditional mean structure with multiple linear and non-linear effects 
of both scalar and functional covariates and their interactions. Our proposal 
is implemented in the full generality described here for both regular grid data 
and sparse or irregularly observed functional responses in the pffr function in 
R package refund [7]. Available response distributions include all exponential 
family distributions as well as Beta, scaled and shifted t-, Negative Binomial, 
Tweedie and zero-inflated Poisson distributions, each with various link func¬ 
tions, as well as cumulative threshold models for ordered categorical responses. 
With the exception of Brockhaus et al. [2], none of the previous proposals in 
this area achieve anything close to this level of generality, not to mention of¬ 
fer publicly available, widely applicable open-source implementations. Since our 
framework is a natural extension of previous work done on generalized addi¬ 
tive mixed models for scalar data and builds on the high performing, flexible 
implementations available for them, we can directly make use of many results 
from this literature, such as improved confidence intervals and tests for smooth 
effects [10, 29]. 

The remainder of this work is structured as follows: Section 2 introduces no¬ 
tation and the theoretical and inferential framework for our model class. Section 
3 presents results for our application. Section 4 summarizes results of our exten¬ 
sive validation on synthetic data and of our partial replication of the simulation 
studies of Wang and Shi [26] and Goldsmith et al. [4]. Section 5 concludes. 

2. Model 

2.1. Model Structure 

In what follows, we discuss structured additive regression models of the general 
form 

yi(t) ~ T{m{t),v) 

R 

9((*i(t)) = Vi(t) = ^2fr{Xri,t), 

r—1 


(i) 
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where, for each t, Ui{t), i = 1,..., n, is a random variable from some distribution 
T with conditional expectation E(yi(t)\Xi,t, v) = fii(t) observed over a domain 
T and an optional vector of nuisance parameters v. We use g(-) to denote the 
known link function. Our implementation allows analysts to choose T from the 
exponential family distributions as well as Tweedie, Negative Binomial, Beta, 
ordered categorical, zero-inflated Poisson and scaled and shifted ^-distributions. 
Note that, in the special case of ordered categorical responses, /z*(t) is not the 
conditional mean of the response itself, but that of a latent variable whose value 
determines the response category. 

Each term in the additive predictor is a function of a) the index t of the 
response and b) a subset X r of the complete covariate set X potentially includ¬ 
ing scalar and functional covariates and (partially) nested or crossed grouping 
factors. Note that the definition also includes functional random effects b g (t) 
for a grouping variable g with M levels. These are modeled as realizations of 
a mean-zero Gaussian random process on {1 ,,M} x T with a general co- 
variance function K b (m,m',t,t') = Cov(6 s , m (f), & ffjm '(f')) that is smooth in t, 
where m, to' denote different levels of g. Table 1 shows how a selection of the 
most frequently used effect types fit into this framework. 

We approximate each term f r (X r , t) by a linear combination of basis functions 
given by the tensor product of marginal bases for X r and t. Since the basis 
has to be rich enough to ensure sufficient flexibility, Section 2.4 describes a 
penalized likelihood approach that stabilizes estimates by suppressing variability 
of the effects that is not strongly supported by the data and finds a data-driven 
compromise between goodness of fit and simplicity of the fitted effects. 


Table 1 

A selection of possible model terms /,. (X r , t) for model ( 1 ). All effects can be constant in t 

as well. 


X r 

type of effect 

fr{X r ,t) 

0 (none) 

smooth intercept 

Po{t) 

scalar covariate 2 

linear or smooth effect 

z/ 3 (t), f(z,t) 

two scalars zi, 22 

linear or smooth interac- 

2122/ 3 (f), 


tion 

21/(22, f), 

f(zi,%2,t) 

functional covariate x(s) 

linear or smooth (histori- 

f x(s)/ 3 (s, t)ds, 


cal) functional effect 

Si'ltf x(s)p( 8 ,t)ds, 

functional covariate v(t) 

concurrent effects 

f F(x(s ), s, t)ds 
v(t)/ 3 (t), 

fW(t),t) 

functional covariates 

concurrent interactions 

v(t)w(t)P{t), 

v(t),w{t) 


f(v{t),w(t),t) 

grouping variable g 

functional random inter- 

bg(t) 

grouping variable g, scalar 2 

cept 

functional random slope 

zb g (f ) 

curve indicator i 

smooth functional resid¬ 
ual 

e* 0 ) 
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2.2. Data and Notation 


In practice, functional responses yi(t) are observed on a grid of Tj points 
t,j = (tn,... which can be irregular and/or sparse. Let yu = yi(ti) and 

Vi = (yni • • •) ViTi)- To fit the model, we form y = (yj ,..., y^) T and 
t = ( tj ,..., t^) T , two N = Tj-vectors that contain the concatenated ob¬ 
served responses and their argument values, respectively. Let X r i contain the 
observed values of X r associated with a given y.;. Model (1) can then be ex¬ 
pressed as 


yu ~ F(yu,v) 

R 

yi.yil ) ^ ' fr ( ri 5 tu ) 


r =1 


( 2 ) 


for i = 1,..., n and l = 1,..., Tj. Let f(t ) denote the vector of function eval¬ 
uations of / for each entry in the vector t and let f(x, t) denote the vector of 
evaluations of / for each combination of rows in the vectors or matrices x , t. In 
the following, we let Xj = T to simplify notation, but our approach is equally 
suited to data on irregular grids. For regular grids, each observed value in any 
X r that is constant over t is simply repeated X times to match up with the 
corresponding entry in the N = nT-ve ctor y. 


2.3. Tensor product representation of effects 

We approximate each term f r (X r ,t) by a linear combination of basis functions 
defined on the product space of the two spaces: one for the covariates in X r 
and one over T, where each marginal basis is associated with a corresponding 
marginal penalty. A very versatile method to construct basis function evalua¬ 
tions on such a joint space is given by the row tensor product of marginal bases 
evaluated on X r and t [e.g. 27, ch. 4.1.8]. Let 1^ = (1,..., 1) T denote a d-vector 
of ones. The row tensor product of an m x a matrix A and an m x b matrix B 
is defined as the m x ab matrix A© B = (A©!^)^!.,/ ©B), where © denotes 
the Kronecker product and • denotes element-wise multiplication. Specifically, 
for each of the terms, 

fr(X r , t) « ($ xr 0 $ tr .) e r =$ r 0 r , (3) 

JVXl JVxif XT . NXK tr K Ir K tr xl 

where & xr contains the evaluations of a suitable marginal basis for the covari- 
ate(s) in X r and contains the evaluations of a marginal basis in t with K xr 
and K tr basis functions, respectively. The shape of the function is determined by 
the vector of coefficients 9 r . A corresponding penalty term can be defined using 
the Kronecker sum of the marginal penalty matrices P xr and Pt r associated 
with each basis [27, ch. 4.1], i.e. 

pen(0 r .|At r , A X r) = 9 r P r (\t r , X xr )0 r , 
where P r (\± r , A xr ^ = \ xr P xr © IK tr T A trl-K xr ^ Ptr- 


( 4 ) 
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P xr and P tr are known and fixed positive semi-definite penalty matrices and 
A tr and X xr are positive smoothing parameters controlling the trade-off between 
goodness of fit and the smoothness of f r {X r ,t) in X r and t , respectively. 

This approach is extremely versatile and powerful as it allows analysts to pick 
and choose bases and penalties best suited to the problem at hand. Any basis 
and penalty over T (for example, incorporating monotonicity or periodicity con¬ 
straints) can be combined with any basis and penalty over X r . We provide some 
concrete examples for frequently encountered effects: For a functional intercept 
Po(t), = Ijv and P xr = 0. For a linear functional effect of a scalar covariate 

z(3(t ), & xr = z ® It and P xr = 0, where z = (zi,. .z n ) T contains the ob¬ 
served values of a scalar covariate z. For a smooth functional effect of a scalar 
covariate & xr = [0/ t (Zi)][<J l <’i a .,. ® It and P xr is the penalty associated 

with the basis functions 4>h(z). 

For a linear effect of a functional covariate x(s)/3(s,t)ds, 

&xr = ( W s ■ X ) $ s , 

NxS NxS S X K xr 

where W s contains suitable quadrature weights for the numerical integration and 
zeroes for combinations of s and t not inside the integration range [l(t),u(t)]. 
X = [Ti(sfc)])|).|s(8)lT contains the repeated evaluations of x(s) on the observed 
grid (si,..., ss) and 3> s = [</>fc(sfc)]i<£fiL,. th e evaluations of basis functions 
4>h{s) for the basis expansion of 0(s,t) over s. P xr is again simply the penalty 
matrix associated with the basis functions <f>h(s ). 

For a functional random slope zb g (t) for a grouping factor g with K xr = M 
levels, = (diag(z)G) <8> It, where G = [8(gi = m)]Jis an incidence 
matrix mapping each observation to its group level. P xr is typically I\j for 
i. i. d. random effects, but can also be any inverse correlation or covariance matrix 
defining the expected similarity of levels of g with each other, defined for example 
via spatial or temporal correlation functions or other similarity measures defined 
between the levels of g. 

In general, 3>t r and Pt r can be any matrix of basis function evaluations = 
In ® over ^ a corresponding penalty matrix. For effects that 

are constant over t, we simply use <J?t r = 1 ; y and Pt r = 0. 

Note that for data on irregular f-grids, the stacking via “®1 t” in the expressions 
above simply has to be replaced by using a different suitable repetition pattern. 
Scheipl et al. [19] discuss tensor product representations of many additional 
effects available in our implementation. 

As for other penalized spline models, fits are usually robust against increases 
in K tr and K xr once a sufficiently large number of basis functions has been cho¬ 
sen since it is the penalty parameters Xt r , X xr that control the effective degrees 
of freedom of the fit, c.f. Ruppert [18], Wood [27, Ch. 4.1.7]. Pya and Wood [14] 
describe and implement a test procedure to determine sufficiently large basis 
dimensions that is also applicable to our model class. 
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2-4■ Inference and Implementation 

Let contain the concatenated design matrices associ¬ 

ated with the different model terms and 0 = {Oj ,..., 0j) T the respective 
stacked coefficient vectors 9 r . Let A = (Ad, A x i,..., XtR, X x r)- The penalized 
log-likelihood to be maximized is given by 

R 

4(0, A, v\y) = £(p, v\y ) - \ pen(0 r |A tr , A xr ) (5) 

r—1 

where p = g~ 1 {$9) and £(p, v\y) = YaLi v\yu) is the log-likelihood 
function implied by the respective response distribution J 7 (p,u,u) given in (2). 

Recent results indicate that REML-based inference for this class of penalized 
regression models can be preferable to GCV-based optimization [16, 28, Section 
4]. Wood [30] describes a numerically stable implementation for directly optimiz¬ 
ing a Laplace-approximate marginal likelihood of the smoothing parameters A 
which we summarize below. Note that this is equivalent to REML optimization 
in the Gaussian response case. The idea is to iteratively perform optimization 
over A and then estimate basis coefficients 0 given A via standard penalized 
likelihood methods. 

We first define a Laplace approximation £la( A, i/|y) « f £ p (9, \,u\y)d6 to 
the marginal (restricted) ML score suitable for optimization. 

Let H = — d 2 £ p (9, A, v\y)/(d0d9 T ) denote the negative Hessian of the penal¬ 
ized log-likelihood (5), and let 0 = arg maxg £ p (9, A, o\y) for fixed A, v with 
fi = g~ 1 (&0). The approximate marginal (restricted) ML criterion can then be 
defined as 

R 

t-LA{\v\y) = t{p,v\y) - ^pen(0,.|A tr , X xr ) - log |P,.(A tr , A xr .)| + ) - 

r—1 ' ' 

\ log \H\ + id 0 log(27r), 

where | A\ + denotes the generalized determinant, i.e., the product of the positive 
eigenvalues of A, and ci 0 is the sum of the rank deficiencies of the P r (X tr , A xr )> r = 

l,...,i?. 

The R package mgev [27] provides an efficient and numerically stable imple¬ 
mentation for the iterative optimization of (6) that 1) finds 0 for each candidate 
A via penalized iteratively re-weighted least squares, 2) computes the gradient 
and Hessian of Ila (A, o\y) w.r.t. log(A) via implicit differentiation, 3) updates 
log(A) via Newton’s method and 4) estimates v either as part of the maxi¬ 
mization for A or based on the Pearson statistic of the fitted model. See Wood 
[28, 30], Wood et al. [31] for technical details and numerical considerations. 

Confidence intervals [10] and tests [29] for the estimated effects are based on 
the asymptotic normality of ML estimates. Confidence intervals use the variance 
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of the distance between the estimators and true effects. Similar to the idea be¬ 
hind the MSE, this accounts for both the variance and the penalization induced 
bias (shrinkage effect) of the spline estimators. 

We provide an implementation of the full generality of the proposed models 
in the pffr() function in the R package refund [7] and use the mgcv [28] imple¬ 
mentation of (scalar) generalized additive mixed models as computational back¬ 
end. The pffrO function provides a formula-based wrapper for the definition 
of a broad variety of functional effects as in Section 2.3 as well as convenience 
functions for summarizing and visualizing model fits and generating predictions. 

3. Application to Pigs’ Feeding Behavior 

3.1. Data 

We use data on pigs’ feeding behavior collected in the ICT-AGRI era-net project 
“PIGWISE”, funded by the European Union. In this project, high frequency 
radio frequency identification (HF RFID) antennas were installed above the 
troughs of pig barns to register the feeding times of pigs equipped with passive 
RFID tags [11]. We are interested in modelling the binary functional data that 
arises from measuring the proximity of the pig to the trough (yes-no) every 10 
secs over 102 days available for each pig. The raw data for one such pig, called 
‘pig 57’ whose behavior we analyse in depth is shown in Figure B.l in Appendix 
B. Such models of (a proxy of) individual feeding behavior can be useful for 
ethology research as well as monitoring individual pigs’ health status and/or 
quality of the available feed stock, c.f. Gertheiss et al. [3]. Available covariates 
include the time of day, the age of the pig (i.e, the day of observation), as well 
as the barn’s temperature and humidity. 

3.2. Model 

Due to pronounced differences in feeding behavior between individual pigs [c.f. 
3, Figure 4], we focus on pig-wise models and model the observed feeding rate 
for a single pig for each day i = 1,..., 102 as a smooth function over the daytime 
t. For our model, we aggregate the originally observed binary feeding indicators 
yi(t), which are measured every 10 seconds, over 10 min intervals into yi(t). This 
temporal resolution is sufficient for the purposes of our modeling effort. We then 
model 


Vi(t)\Xi ~ Bin(60,7T = /q(f)); 



As is typical for applied problems such as this, many different distributional 
assumptions for yi{t)\Xi and specifications of fr(X r i, t) are conceivable. 
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In this case, both Beta and Negative Binomial distributions yielded less plausi¬ 
ble and more unstable fits than the Binomial distribution. With respect to the 
additive predictor, the effect of temperature or humidity (i.e., X r i = humj(£)), 
for example, could be modeled as a linear functional effect hum (s)/ 3 (s, t)ds , 

which represents the cumulative effect of humidity exposure over the time win¬ 
dow l(t) < t < u(t), or the concurrent effect of humidity, either linearly via 
hwa(t)/ 3 h{t) or non-linearly via /(hum(£),t). Auto-regressive and lagged effects 
of previous feeding behavior (i.e., X r i = ?/,(£)) could be modeled similarly ei¬ 
ther as the cumulative effect of previous feeding over a certain time window 
(//r/° mm 2/(s)/3(M)ds) or nonlinear - S),t)) or linear (y^t - S)/ 3 (t)) 

effects of the lagged responses with a pre-specified time lag 6. Finally, our im¬ 
plementation also offers diverse possibilities for accounting for aging effects and 
day-to-day variations in behavior (i.e., X r i = i). Aging effects that result in a 
gradual change of daily feeding rates over time could be represented as ?/3(£) for 
a gradual linear change or as f(i,t) for a nonlinear smooth change over days 
i. Less systematic day-to-day variations can be modeled as daily functional 
random intercepts &i(£), potentially auto-correlated over i to encourage similar 
shapes for neighboring days. In this application, the performances of most of 
these models on the validation set were fairly similar and the subsequent section 
presents results for a methodologically interesting model that was among the 
most accurate models on the validation set. 

3.3. Results 

In what follows, we present detailed results for an auto-regressive binomial logit 
model with smoothly varying day effects for pig 57 (see Figure B.l, Appendix 

B): 

nt— lOmin 

logit (m*(£)) = Po(t) + /(*,£) + / yi(s)P(t,s)ds. 

Jt- 31i 

This model assumes that feeding behavior during the previous 3 hours affects 
the current feeding rate. We use periodic P-spline bases over t to enforce sim¬ 
ilar f r (X r i,t) for t =00:00h and t =23:59h for all r. The term /(«,£) can be 
interpreted as a non-linear aging effect on the feeding rates or as a functional 
random day effect whose shapes vary smoothly across days i. This model con¬ 
tains dim(A) = 5 smoothing parameters (1 for the functional intercept, 2 each 
for the daily and auto-regressive effects), dim(<?) = 106 spline coefficients (24 
for the intercept, 25 for the autoregressive effect, 56 for the day effect) and 
N = nT = 9648 observations in the training data. We leave out every third 
day to serve as external validation data for evaluating the generalizability of the 
model. The fit takes about 1 minute on a modern desktop PC. Appendix B con¬ 
tains detailed results for an alternative model specification, Section 4.1 describes 
results for synthetic data with similar structure and size. 

Beside giving insight into pigs’ feeding patterns and how these patterns 
change as the pigs age, the model reported here enables short-term predictions 
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of feeding probabilities for a given pig based on its previous feeding behavior. 
This could be very helpful when using the RFID system for surveillance and 
early identification of pigs showing unusual feeding behavior. Unusual feeding 
behavior is then indicated by model predictions that are consistently wrong; i.e, 
the pig is not behaving as expected. Such discrepancies can then indicate prob¬ 
lems such as disease or low-quality feed stock. For the auto-regressive model 
discussed here, only very short-term predictions 10 minutes into the future can 
be generated easily as yi(t — lOmin) is required as a covariate value. Other 
model formulations without such auto-regressive terms or larger lead times of 
auto-regressive terms will allow more long-term forecasting. More long-term 
forecasts for auto-regressive models could also be achieved by using predictions 
jjiit) instead of observed values of yi{t) as inputs for the forecast. These can 
be generated sequentially for timepoints farther and farther in the future, but 
errors and uncertainties will accumulate correspondingly. A detailed analysis of 
this issue is beyond the scope of this paper, Tashman [22] gives an overview. 
In addition to the lagged response value, also the aging effect is needed. 

Here, we could either use the value from the preceding day i — 1, if it can be 
assumed that the pig’s (expected) behavior does not change substantially from 
one day to the next; or do some extrapolation. 

Figure 1 shows fitted and observed values for 12 selected days from the train¬ 
ing data (top) as well as estimated and observed values for 12 selected days 
from the validation data (bottom). The model is able to reproduce many of 
the feeding episodes, in the sense that peaks in the estimated probability curves 
mostly line up well with observed spikes of yi{t). This model explains about 24% 
of the deviance and, taking the mean over all days and time-points, achieves a 
Brier score of about 0.024 on both the training data and the validation data, 
i.e., we see no evidence for overfitting with this model specification. This model 
performs somewhat better on the validation data than an alternative model 
specification with i. i. d. random functional day effects &i(f), c.f. Appendix B, 
Figure B.2. 

Figure 2 shows the estimated components of the additive predictor for the 
model with smoothly varying day effects. The functional intercept in the left 
top panel of Figure 2 shows two clear peaks of the feeding rate around lOh 
and 18h, and very low feeding activity from 19h to 23h and from 2h to 6h. A 
two-peak profile has been identified previously as the typical feeding behavior 
of pigs [12, 5]. Our analysis exploiting the rich information provided by the new 
RFID antennas used [11], however, gives pig-specific results, whereas previous 
studies typically reported on group characteristics; see, e.g., Hyun et al. [8] and 
Guillemet et al. [5]. Furthermore, existing results usually aggregated over time 
periods. 

The estimated smoothly varying day effect (top right panel) shows increased 
feeding rates in the early morning in the first few days and a corresponding 
strong reduction in feeding rates in the early morning towards the end of the 
fattening period, as well as a tendency for increased feeding activity to concen¬ 
trate more strongly in two periods around 9h and 21h towards the end of the 
observation period, a pattern also visible in the raw data shown in Figure B.l 
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fx(t) and y(t) for selected days: training data 
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ft(t) and y(t) for selected days: validation data 



Fig 1. Training (top) and validation data (bottom) results for pig 57 for selected days with a 
smoothly varying day effect. Black lines for fitted (top) and predicted (bottom) values jafft), 
red for observed feeding rate yi(t)/6 0 . Numbers above each panel denote the day. Vertical axis 
on ^ff-scale. 
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Fig 2 . Top row: Estimated functional intercept and smoothly varying day effects for the model 
with smoothly varying day effect. 

Bottom: Estimated coefficient surface for cumulative auto-regressive effect. Point-wise inter¬ 
vals are =L 2 standard errors. 


(Appendix B). The cumulative auto-regressive effect of feeding during the previ¬ 
ous 3 hours is displayed in the bottom row of Figure 2. Unsurprisingly, feeding 
behavior in the immediate past is associated positively with current feeding 
(c.f. the blue region for s — t « 0), especially during off-peak times in the early 
morning where a higher propensity for feeding is not already modeled by the 
global functional intercept. The model also finds a negative association between 
prior feeding during the night and feeding in the very early hours of the morning 
that takes place 1 to 3 hours later (c.f. the red region around t = 0). Confidence 
intervals for all three effects show that they can be estimated fairly precisely. 

4. Simulation Study 

This section describes results on binomial (Section 4.1) and Beta-, t- and neg¬ 
ative binomial-distributed data (4.2). Sections 4.3 and 4.4 present replications 
of (parts of) the simulation studies on functional random intercept models in 
Wang and Shi [2G] and Goldsmith et al. [4], respectively. Note that neither of 
these competing approaches are able to fit nonlinear effects of scalar covariates, 
effects of functional covariates, functional random slopes or correlated func¬ 
tional random effects in their current implementation, while pffr does. Both 
approaches are implemented only for binary data, not the broad class of distri¬ 
butions available for pf f r-models. All results presented in this section can be re- 
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produced with the code supplement available for this work at http: //stat. uni- 
muenchen.de/~scheipl/downloads/gfamm-code.zip. 

To evaluate results on artificial data, we use the relative root integrated mean 
square error rRIMSE of the estimates evaluated on equidistant grid points over 


T: 


rRIMSE (f r (X r ,t)) 


N 


\T\ V - ' — f(X r j,tl)) 2 

nT ^ sd (/(Af ri ,t))2 


where sd (f(X ri ,t)) is the empirical standard deviation of [f(X ri , 

i.e., the variability of the estimand values over t. We use the relative RIMSE 

rather than RIMSE(/ r (A’ r , t)) = J2i,i(fr( x n,ti) ~ fr(X ri ,t{)) 2 in order 

to make results more comparable between estimands with different numerical 
ranges and to give a more intuitively accessible measure of the estimation error, 
i.e., the relative size of the estimation error compared to the variability of the 
estimand across T. 


4.1. Synthetic Data: Binomial Additive Mixed Model 

We describe an extensive simulation study to evaluate the quality of estimation 
for various potential model specifications for the PIGWISE data presented in 
Section 3.1. We simulate Binomial data yi(t) ~ B (trials, 7r = logit -1 ( 77 * (f))) 
with trials = 10, 60, 200 for n = 100 observations on T = 150 equidistant grid- 
points over [0,1] for a variety of true additive predictors ryit) = fr(%ri, t). 

The subsequent paragraph describes the various f r {X r i,t) we investigate. 

The additive predictor always includes a functional intercept /3o(t). For mod¬ 
elling day-to-day differences in feeding rates, the additive predictor includes an 
i. i. d. functional random intercept bi(t) (setting: ri) or a smoothly varying day 
effect f(i,t) (setting: day). Note that the “true” functional random intercepts 
were generated with Laplace-distributed spline coefficients in order to mimic the 
spiky nature of the application data. For modeling the auto-regressive effect of 
past feeding behavior, the additive predictor can include one of 

• a time-varying nonlinear term f(yi(t — .005), t) (lag), 

• a time-constant nonlinear term f(yi{t — .005)) (lag.c), 

• a cumulative linear term over the previous 0.3 time units f w yi(s)/3(t, s)ds ; 

W = {s : t — 0.3 < s < t} (ff .3), 

• a cumulative linear term over the previous 0.6 time units f w yi(s)/3(t, s)ds ; 

W = {s : t — 0.6 < s < t} (ff .6), 

where yi(s) = (yi(s) — y{s)) denotes the centered responses and y(s) the mean 
response function. The simulated data also contains humidity and temperature 
profiles generated by drawing random FPC score vectors from the respective 
empirical FPC score distributions estimated from the (centered) humidity and 
temperature data provided in the PIGWISE data. For modeling the effect of 
these functional covariates, the additive predictor includes either a nonlinear 
concurrent interaction effect /(humj(f),temp^t), t) (ht) or a nonlinear time- 
constant concurrent interaction effect /(hunq(i),temp,(t)) (ht.c). 
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Fig 3. rRIMSE(fi(t)) and coveraqes for the different models for synthetic binomial data (60 
trials, logit- 1 (/3 0 (i)) e [0.04, 0.19] j. 


We simulate data based on additive predictors containing each of the terms 
described above on its own, as well as additive predictors containing either 
functional random intercepts (ri) or a smoothly varying day effect (day) and 
one of the remaining terms, for a total of 21 different combinations. For all of 
these settings, we vary the amplitude of the global functional intercept (small: 
logit -1 (/3o(i)) £ [0.06,0.13] Vi; intermediate: £ [0.04,0.19]; large: £ [0.02,0.34]). 
For the settings with just a single additional term we also varied the absolute 
effect size of these terms. For small effect sizes the respective exp (f r (X r i,t)) £ 
[0.5,2] Vi, is £ [0.2,5] for intermediate effect sizes and £ [0.1,10] for large 
effect sizes. We ran 30 replicates for each of the resulting 333 combinations 
of simulation parameters, for a total of 9990 fitted models. Since the settings’ 
effects are large and systematic and we use a fully crossed experimental design, 
30 replicates are more than enough to draw reliable qualitative conclusions from 
the results here. 

We focus on the results for trials = 60 with intermediate amplitude of the 
global intercept as these settings are most similar to the application. Results 
for the other settings are qualitatively similar, with the expected trends: perfor¬ 
mance improves for more informative data (i.e, more “trials”) and larger effect 
sizes and worsens for less informative data with fewer “trials” and smaller effect 
sizes. 

Figure 3 shows rRIMSE(r)(f)) (left panel) and average point-wise coverages 
(right panel) for the various settings ■ it is easy to see that the concurrent func¬ 
tional effects (humtemp, Irumtemp. c) and especially the daily random intercepts 
(ri) are the hardest to estimate. Further study shows that concurrent interac¬ 
tion effects are very sensitive to the rank of the functional covariates involved: 
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both humidity and temperature are practically constant in this data and show 
little variability, so effects are hard to estimate reliably in this special case due 
to a lack of variability in the covariate. The daily random intercepts are gen¬ 
erated from a Laplace distribution of the coefficients to yield “peaky” functions 
as seen in the application, not the Gaussian distribution our model assumes. 
Figure A.3 in Appendix A.l displays a typical fit for the setting with functional 
random intercepts and a cumulative autoregressive effect (ri+ff .6) and shows 
that rRIMSEs around .2 correspond to useful and sensible fits in this setting. 
Also note that performance improves dramatically if the Gaussianity assump¬ 
tion for the random intercept functions is met, c.f. the results in Sections 4.3 
and 4.4 for Gaussian random effects in some much simpler settings. 

Note that we are trying to estimate a latent smooth function for each day 
based only on the feeding episodes for that day for ri and, through smoothing, 
based on each day and its neighboring days in the case of day. The results show 
that this strategy can work rather well, albeit with a high computational cost in 
the case of ri: On our hardware, models with such functional random intercepts 
usually take between 5 minutes and 2 hours to fit, while models with a smooth 
day effect usually take between 15 seconds and 15 minutes, depending on which 
covariate effects are present in the model. See Figure A.l in Appendix A.l for 
details on computation times. 

As seen in the replication studies in Sections 4.3 and 4.4, estimating random 
intercepts with our approach also works as well or better than competing ap¬ 
proaches in more conventional settings with multiple replicates per group level 
rather than the one considered here with a single curve per group level. The 
results here still represent quite successful fits - Figure 4 shows results for a 
typical day+ff. 6-model that yields about the median RIMSE for that setting. 
Most estimates are very close to the “truth”. 

In terms of Cl coverage, we find that CIs for i%{t) (Figure 3, right panel) are 
usually close to their nominal 95% level. 

Appendix A.l shows typical fits for some of the remaining settings. In sum¬ 
mary, the proposed approach yields useful point estimates for most terms under 
consideration and confidence intervals with coverages consistently close to the 
nominal level on data with a similar structure as the application. 


4-2. Synthetic Data: Beta-, Negative Binomial-, t(3)-Distribution 

4-2.1. Data Generating Process 

We generate data with responses with scaled t(3)-distributed errors, Beta(a, j3)- 
distributed responses and Negative Binomial NB(^t, (^-distributed responses. We 
generate data with n = 100, 300 observations, each with signal-to-noise ratios 
SNR = 1,5. SNR is not easily adjustable for Negative Binomial data since 
variance and mean cannot be specified separately. In our simulation, we use 
g(-) = log(-), with 6 = 0.5, so that Var (y(t)) = E (y(t)) + 2E (y(t)) 2 . See Ap¬ 
pendix A.2 for details. 
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Fig 4. Typical model fit for setting day+ff.6: smoothly varying day effect with a cumulative 
auto-regressive term. Left column shows true (red, dashed) and estimated (black, solid) con¬ 
ditional expectation (top) for 12 randomly selected days in the top row, estimated (top panel) 
and true (bottom panel) smoothly varying day-daytime effect (middle and bottom rows). Right 
column shows true and estimated global intercept (top), estimated (middle, top panel) and true 
(middle, bottom panel) coefficient surface for the cumulative auto-regressive effect and true 
and estimated contributions of the cumulative auto-regressive effect to the additive predictor 
(bottom). For this model, rRIMSE(f}(t)) = 0.054, while the median rRIMSE for this setting 
is 0.069 
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Fig 5. Relative RIMSE (left) and achieved coverage (right, nominal 95%) for estimated ad¬ 
ditive predictor. Rows for the three distributions and columns for the 4 model settings. Blue 
for high noise settings with SNR = 1, red for SNR = 5, white for NB without available SNR. 
Horizontal axis represents the varying number of observations. Fat grey horizonal line denotes 
nominal 95% coverage. 


Functional responses are evaluated on T = 60 equidistant grid-points over 
[0,1]. We investigate performance for 4 different additive predictors: 

• int: E (y(t)) = g _1 (/3 0 (t)) 

• smoo: index-varying non-linear effect: E(y(f)) = g~ 1 (/3o(t) + f(x,t)) 

• te: non-linear interaction: E(y(t)) = g~ 1 {Po(t)+f(xx,X 2 )) via tensor prod¬ 
uct spline 

• ff: functional covariate: E = g _1 (/ 3o(t) + f x(s)/3(t, s)ds) 

50 replicates were run for each of the 32 settings for Beta and t( 3) and each of 
the 16 settings for NB. 

4-2.2. Results 

Figure 5 shows relative RIMSEs (left figure) and achieved point-wise coverages 
averaged across t and i (right figure, nominal 95%) for the estimated additive 
predictor for each setting and replicate. Both point estimation and uncer¬ 
tainty quantification work well in these settings. For the point estimates, we 
observe the expected patterns of increasing precision with bigger data sets and 
decreasing noise levels. The systematic under-coverage for n = 300, SNR= 5 
we observe for the Beta-settings for the model with a time-constant nonlinear 
interaction effect (third column, first row) seems of little practical relevance - 
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Fig 6. Relative RIMSE (left) and achieved coverage (right, nominal 95%) for estimated ef¬ 
fects. Columns for the three distributions and rows for settings with index-varying nonlinear 
effect, nonlinear interaction effect and linear function-on-function effect, respectively. Blue 
for high noise settings with SNR = 1, red for SNR = 5, white for NB without available SNR. 
Horizontal axis represents the varying number of observations. 
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in this setting, the point estimates are so close to the true effects and estima¬ 
tion uncertainty becomes so small that the CIs almost vanish, and in any case 
observed coverages mostly remain above 90% for nominal 95%. 

Figure 6 shows relative RIMSEs (left) and coverages (right) for the estimated 
effects in the three settings that include more than an intercept. Again, we 
observe the expected patterns of increasing precision with bigger data sets and 
decreasing noise levels. Estimates are generally fairly precise and most coverages 
are close to or greater than the nominal 95%-level. Systematic under-coverage 
occurs for the scalar interaction effect (middle column) for the Beta and t( 3)- 
distributions, but note again that given the high precision of the point estimates 
the fact that CIs tend to be too narrow is of little practical relevance. This could 
also indicate a basis specification that is too small to fit the true effect without 
approximation bias. We also observe a single replicate of setting smoo with t{ 3)- 
distributed errors with much larger rRIMSE and coverage around 20% (bottom 
row, leftmost panels, rightmost boxplots), indicating that the robustification by 
specifying a f-distribution may not always be successful. 

Median computation times for these fairly easy settings were between 3 and 
49 seconds. Estimating the bivariate non-linear effect for setting smoo took the 
longest, with some fits taking up to 8 minutes. We observed no relevant or 
systematic differences in computation times between the three response distri¬ 
butions we used. 

Appendix A.2 shows some typical results for these settings and gives addi¬ 
tional details on the data generating process as well as tabular representations 
of the performance measures in Figures 5 and 6. 


4-3. Replication of Simulation Study in Wang and Shi [26] 

This section describes our results of a replication of the simulation study de¬ 
scribed in Section 4.1 of Wang and Shi [26]. We ran 20 replicates per setting. The 
data comes from a logit model for binary data with just a functional intercept 
and observation-specific smooth functional random effects. The observation- 
specific smooth functional random effects bft) are realizations of a Gaussian 
process with “squared exponential” covariance structure, the functional inter¬ 
cept do (t) is a cubed sine-function, see Wang and Shi [26, their Section 4.1.] 
for details. The model and data generating process are thus P(yi(ti) = 1) = 
logit -1 (r]i(t)) with gft) = /3 0 (ti) + bfti)- i = 1,..., n; l = 1,..., T. 

Note that Wang & Shi’s MATLAB [23] implementation of their proposal 
(denoted by ggpfr) uses the same “squared exponential” covariance structure 
for fitting as that used for generating the data, while our pffr- implementation 
uses a B-spline basis (8 basis functions per subject) to represent the b.ft). We 
used the same basis dimension for (3o(t) for ggpfr as Wang & Shi. 

Boxplots in Figure 7 show RIMSEs for the estimated effects and the additive 
predictor for each of the 20 replicates per combination of settings. While ggpfr 
tends to yield more precise estimates for bi(t.) : it often does not do as well as pffr 
in estimating the functional intercept. As a consequence, the two approaches 
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Fig 7. Replication of Wang and Shi [26]: RIMSE for the estimated effects and the additive 
predictor. Top: functional intercept; middle: functional random effects; bottom: sum of the 
two. Each graph shows results for n = 30,60 (rows) and T = 21,41,61 (columns). Upper 
boxplots for pffr, lower for ggpfr. 
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Fig 8. Replication of Wang and Shi [26]: Observed Cl coverage for r){t) = logit (So(t) + 6j(t)) 
for n = 30, 60 (rows) and T = 21, 41, 61. Upper boxplots for pffr, lower for ggpfr. Vertical 
fat gray line denotes nominal 90% coverage. 
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Fig 9. Replication of Wang and Shi [26]: Computation times for n = 30,60 (rows) and 
T = 21, 41, 61 (columns). Upper boxplots for pffr, lower for ggpfr. Horizontal axis on log 10 - 
scale. 


yield very similar errors for r)i(t ), even though the data generating process is 
tailored towards ggpfr and more adversarial for pffr. Note that we were not 
able to reproduce the mean RIMSE values for logit(? 7 (f)) reported by Wang and 
Shi [26] despite the authors’ generous support. Our mean RIMSE results for 
ggpfr were 0.37, 0.33, 0.30 while Wang and Shi [26] reported 0.32, 0.26, 0.24 for 
n = 60; T = 21,41,61, respectively. 

Figure 8 shows observed Cl coverage for nominal 90% CIs for the two ap¬ 
proaches evaluated over Both approaches mostly do not achieve nominal 
coverage for these very small data sets, but coverages for pffr converge towards 
the nominal level much faster as n and T increase than those for ggpfr and show 
fewer and smaller incidences of severe under-coverage. Figure 9 shows compu¬ 
tation times for the two approaches. The median computation time for ggpfr 
is about three times that of pffr for T = 21, which increases to a factor of 
13-19 for n = 60; T = 41,61 and a factor of 40-48 for n = 30; T = 41, 61 1 * 3 . Note 
that computation times for ggpfr are given for a single run using a pre-specified 
number of basis functions for estimating [3o(t) here. But since the BIC-based se- 

1 Note that these computation times seem to be extremely sensitive towards different MAT- 

LAB versions - in personal communication, Bo Wang reported average computation times of 

3 minutes for T = 21, N = 60 (our result for ggpfr: ss 19), 4 minutes for T = 41, N = 30 (our 
result: 200) and 10 minutes for T = 61, N = 60 (our result: Ri 540) for a single run using 

MATLAB 7.4 on an Intel Core Duo CPU (2.53GHz, 3GB RAM) instead of MATLAB 7.12 
which we used. 
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lection of the basis dimension proposed by Wang and Shi [26] requires k runs for 
selecting between k candidate numbers of basis functions, actual computation 
times for ggpfr in practical applications will increase roughly A;-fold. 

Our replication shows that pff r achieves mostly similar estimation accuracies 
with better Cl coverage for a data generating process tailored specifically to 
ggpfr’s strengths. Depending on the MATLAB version that is used, pffr does 
so in similar or much, much shorter time. 

4-4- Replication of Simulation Study in Goldsmith et al. [4] 

This section describes our results of a replication of the simulation study de¬ 
scribed in Web Appendix 2 of Goldsmith et al. [4]. We ran 20 replicates per 
setting. The data comes from a logit model for binary data with a functional 
intercept, a functional linear effect of a scalar covariate and observation-specific 
smooth functional random effects. The observation-specific smooth functional 
random effects bi(t) are drawn using 2 trigonometric functional principal com¬ 
ponent functions, the functional intercept /3 0 (t) is a trigonometric function as 
well and the functional coefficient function f3i(t) associated with a _/V(0, 25)- 
distributed scalar covariate x is a scaled normal density function, see Goldsmith 
et al. [4, Web Appendix, Section 2] for details. Note that genfpca, the rstan 
[21] implementation provided by Goldsmith et al., uses the same number of 
FPCs (i.e, two) for the fit as used for generating the data, while our pffr- 
implementation uses 10 cubic B-spline basis functions to represent each bi(t). 
The model and data generating process are thus P(yji(ti) = 1) = logiW 1 (r}i(t)) 
with r]i(t) = po{ti) + P\{ti)Xi + bi(ti); i = 1,..., n; l = 1,..., T. 

Figure 10 shows RIMSE for (top to bottom) the estimated functional inter¬ 
cept, the estimated functional coefficient, the functional random effect and the 
combined additive predictor. It seems that pffr yields somewhat more precise 
estimates for /?o(f) in this setting, while the estimation performance of the two 
approaches for the functional coefficient and the random effects is broadly sim¬ 
ilar although the results for pffr for /3i(t) show more variability, pffr yields 
slightly better estimates of the total additive predictor r]i(t) in this setting. Note 
that Goldsmith et al. [4] report mean integrated square error (MISE, in their 
notation) while we report the RIMSE. Taking this into account, the genfpca 
results reported here correspond very closely to theirs. Figure 11 shows ob¬ 
served confidence [credibility] interval (Cl) coverage for nominal 95% CIs for 
the two approaches evaluated over rji(t). Both approaches mostly achieve close 
to nominal coverage in this setting, but while coverages for pffr are mostly 
greater than the nominal 95% for T = 50 with some rare problematic fits with 
coverages below 75%, genfpca shows small but systematic under-coverage for 
T = 100. Figure 12 shows computation times for the two approaches. The me¬ 
dian computation time for genfpca is about 4 times that of pffr for n = 50, 
and 0.80 to 0.94 that of pffr for n = 100. 

In summary, our replication shows that pffr achieves mostly similar estima¬ 
tion accuracies and coverages in much shorter time or roughly the same time 
for a data generating process tailored specifically to genfpca’s strengths. 
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Fig 10. Replication of Goldsmith et al. [ 4 .]: RIMSE for the estimated effects and the additive 
predictor. Top: functional intercept; 2nd row: functional coefficient; 3rd row: functional ran¬ 
dom effects; bottom: sum of the three. Each graph shows results for n = 50,100 (rows) and 
T = 50,100 (columns). Upper boxplots for pffr, lower for genfpca. 
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4-5. Summary of Simulation results 

We achieve useful and mostly reliable estimates for many complex settings in 
data structured like the PIGWISE data in acceptable time. Simulation results 
for less challenging settings and other response distributions than Binomial are 
good to excellent as well. Our partial replications of the simulation studies in 
Wang and Shi [26] and Goldsmith et al. [4] in Sections 4.3 and 4.4, respectively, 
show that our implementation achieves highly competitive results in similar or 
much shorter computation times even though the settings are tailored towards 
the competing earlier approaches which are also less flexible than ours by quite 
a large margin. 

5. Discussion 

This work introduces a comprehensive framework for generalized additive mixed 
models (GAMM) for non-Gaussian functional responses. Our implementation 
extends all the flexibility of GAMMs for dependent scalar responses to depen¬ 
dent functional responses and functional covariates, even for response distri¬ 
butions outside the exponential family such as robust models based on the t- 
distribution. Dependency structures can be spatial, temporal or hierarchical. 
Simulation and application results show that our approach provides reliable 
and precise inference about the underlying latent processes. 

The work presented here opens up promising new avenues of inquiry - one 
challenge that we have already begun to work on is to improve the speed and 
memory efficiency of the underlying computational engine to be able to fit the 
huge data sets increasingly common in functional data analysis. Along these 
lines, more efficient computation of simultaneous instead of pointwise confi¬ 
dence intervals for functional effects as well as a more detailed investigation 
of the performance of the available approximate pointwise confidence intervals 
for noisy and small non-Gaussian data is another important field of inquiry. 
An extension of the approach presented here to models with multiple additive 
predictors controlling different aspects of the responses’ distribution like the 
generalized additive models for location, scale and shape introduced for scalar 
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response by Rigby and Stasinopoulos [17] or zero-inflation and hurdle models - 
is yet another promising generalization of the ideas we have presented here. 
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Fig A.l. Computation times for the different models. Time axis on log 2 -scale. 


Appendix A: Additional Simulation Details 

Section A.2 provides some more details on the data generating process used in 
Section 4.2 and shows some typical effect estimates in these settings. 

A.l. Simulation 1: Computation times, Details and Examples 
A. 1.1. Computation times 

Figure A.l shows computation times for the various settings. Note that the 
times are plotted on binary log-scale. One fit for a random intercept model (ri) 
did not converge within 200 iterations and ran for more than 16 hours. 

A. 1.2. Computational details 

We use the following (spline) bases to construct & x for the different effects: 

• lag. c, lag: 20 cubic thin plate splines over y(t) 

• humtemp, humtemp.c: 50 bivariate cubic thin plate splines over the joint 
space of hum and temp. 

• day: 8 cubic thin plate splines over i = 1,..., 100. 

• ri: dummy variables for each curve i = 1,..., 100. 


















Scheipl, Gertheiss, Greven/GFAMM 


A.28 


The coefficient surfaces for ff .3, ff.6 are estimated with 30 bivariate cubic 
thin plate splines over T x T. The functional intercept (int) uses 40 cubic 
cyclic P-splines with first order difference penalty over T, while the functional 
random intercepts (ri) use 9 of those per curve. For all other terms, we use 8 
cubic P-splines with first order difference penalty for &t- 

A. 1.3. Typical model fits 

Figures A.2 to A.4 show graphical summaries of typical model fits for three 
difficult settings of the simulated binomial data discussed in Section 4.1. It is 
clear to see that estimating unstructured daily functional random intercepts is 
a harder task than estimating a smoothly varying aging effect (compare the 
accuracy of f(t,i) in Figures A.2 and A.4 to that of bft) in Figure A. 3), and 
that concurrent functional nonlinear interaction effects can not be recovered very 
reliably in this setting. Also note that we can model very rough latent response 
processes pft) for these data by including, e.g., a (nonlinear) auto-regressive 
term as in the top left panel of Figure A.2. 

A.2. Simulation Study 2: Details & Typical Fits 
A.2.1. Data Generating Process for Simulation Study 2 

We use different signal-to-noise ratios (SNR) to control the difficulty of the sim¬ 
ulation settings. For Beta-distributed responses with g(-) = logit -1 (•), distribu¬ 
tion parameters a, /3 are determined by approximate SNR via 4> — SNR^hi — 1, 

with = N^ 1 t ~ A kt), where v^ is the sample variance of the simu¬ 

lated pu and p = logit -1 (rj). Beta parameters for generating y are then given 
by a = (j>p and /3 = (f> — a. r) is first scaled linearly to the interval [—1.5,1.5]. 

A.2.2. Computational details 

We use the following spline bases to construct for the different effects: 

• smoo: 8 cubic thin plate splines over the range of x 

• te: 45 bivariate cubic thin plate splines over the joint space of x\ and X 2 

• ff: 5 cubic P-splines with first order difference penalty over S 

The functional intercept (int) uses 40 cubic cyclic P-splines with first order 
difference penalty over T. For all other terms, we use 5 cubic P-splines with 
first order difference penalty for 3> t . 

A.2.3. Typical Fits for Simulation Study 2 

Figures A.5 to A.7 show some typical fits for these data generating processes. 
Figure A.5 shows estimated effects for a Beta-response model with a nonlinear 
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Fig A.2. Typical model fit for setting day+lag. c: smoothly varying day effect with a nonlin¬ 
ear auto-regressive term. Left column shows true (dashed, red) and estimated (solid, black) 
conditional expectation for 12 randomly selected days in the top row and estimated and true 
smoothly varying day-daytime effect in the middle and bottom panels. Right column shows 
true and estimated global intercept (top) and true and estimated nonlinear auto-regressive 
effect (middle) and its true and estimated contributions to the additive predictor. For this 
model, rRIMSE(fji{t)) = 0.05, about the same as the median rRIMSE for this setting. 
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Fig A. 3. Typical model fit for setting ri+ff. 6: random functional intercepts for each day with 
a cumulative auto-regressive term. Left column shows true (dashed, red) and estimated (solid, 
black) conditional expectation (top row) and estimated and true random intercepts (middle). 
Right column shows true and estimated global intercept (top) and estimated (middle, top 
panel) and true (middle, bottom panel) coefficient surface for the cumulative auto-regressive 
effect and true and estimated contributions of the cumulative auto-regressive effect to the 
additive predictor (bottom) for 12 randomly selected days. For this model, rRIMSE(fj(t)) = 
0.2, while the median rRIMSE for this setting is 0.19. 
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Fig A.4. Typical model fit for setting day+humtemp. c: smoothly varying day effect with a 
time-constant concurrent nonlinear interaction effect of humidity and temperature. Left col¬ 
umn shows true (dotted) and estimated (solid) conditional expectation (top) for 12 randomly 
selected days in the top row, estimated (top panel) and true (bottom panel) smoothly vary¬ 
ing day-daytime effect (middle), and true and estimated contributions of the time-constant 
humidity-temperature interaction effect to the additive predictor (bottom). Right column shows 
true and estimated global intercept (top) and estimated (middle, top panel) and true (mid¬ 
dle, bottom panel) effect surface for the humidity-temperature interaction. For this model, 
rRIMSE(fj(t)) = 0.05, while the median rRIMSE for this setting is 0.05. 
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Fig A.5. Typical model fit for a Beta-distributed model with a nonlinear effect of scalar 
covariate (smoo, n = 300, SNR = 1) with rRIMSE(fj(t)) ~ 0.047 and coverage ~ 0.98 for fj(t). 
Red line in the left panel shows true intercept function, light grey ribbon gives approximate 
pointwise 95% interval. On the right, top panel shows estimated nonlinear effect of scalar 
covariate f(x,t) and bottom panel the true f(x,t). 


effect of a scalar covariate (smoo, n = 300, SNR = 1), Figure A.6 shows esti¬ 
mated effects for a Negative Binomial-response model with a linear function- 
on-function effect (ff, n = 100) and Figure A.7 shows estimated effects for a 
t(3)-response model with a nonlinear interaction effect of two scalar covariates 
(te, n = 100, SNR = 5) 


A.2.4- Tabular Results for Simulation Study 2 

Tables 2 and 3 show median rRIMSES and coverages for simulation study 4.2 
for the entire additive predictor and the estimated covariate effects, respectively. 

A.3. Computational Details 

Code to reproduce results for Section 4 is included in a code supplement. 
Results described here were computed on various Linux PCs and servers under 

R-3.2.3 to 3.2.5 with mgcv 1.8-5 to 1.8-7, refundDevel 0.1-11 to 0.1-15. 
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Fig A.6. Typical model fit for a NB-distributed model with a linear effect of a functional 
covariate (ff, n = 100J with rRIMSE(fj(t)) ~ 0.083 and coverage ~ 0.97 for r)(t). Red 
dotted line in the left panel shows true intercept function, light grey ribbon gives approximate 
pointwise 95% interval. On the right, top panel shows estimated (3(s,t ) and bottom panel the 
true (3(s,t) for the function-on-function effect. 
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Po(t) and Po(t) 




Fig A.7. Typical model fit for a 13-distributed model with a nonlinear interaction effect of two 
scalar covariates (te, n = 300, SNR = 5) with rRIMSE(fj(t)) ~ 0.043 and coverage ~ 0.94 
for f)(t). Red dotted line in the left panel shows true intercept function, light grey ribbon 
gives approximate pointwise 95% interval. On the right, top panel shows estimated smooth 
interaction f(xi,X 2 ) and bottom panel the true f(x 1 , 0 : 2 ). 


f 



Fig B.l. Observed feeding episodes for pig 57. Horizontal axis gives time of day, vertical axis 
represents days. Black dots show observed feeding episodes as measured by proximity to the 
trough (yes-no). 
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Appendix B: PIGWISE Model: Alternatives and Criticism 

This section shows the raw data used for the PIGWISE application (Figure B.l) 
and compares the result of a different model specification to that of the main 
article. 

Beyond the smoothly varying functional day effects used in the analysis in 
Section 3.2, we considered two alternative random effect specifications: auto- 
correlated functional random day effects with a marginal AI?(l)-structure with 
auto-correlation 0.8 over the days as well as i. i. d. random day effects b t (i). We 
also considered models with no day effects at all. 

Beyond the 3h-cumulative auto-regressive effect of previous feeding used 
in the analysis in Section 3.2, we also fit models with a 6h-cumulative auto¬ 
regressive effect Ui(s)/3(t, s)ds) as well as non-linear time-constant 

auto-regressive effects (f(yi{t— lOmin))) and time-varying (— 10min),f)). 
We also considered models with no auto-regressive effects at all. 

To model possible effects of humidity and temperature, we considered ad¬ 
ditive non-linear time-varying (/(hum(t),f) + /(temp(f), t)) and time-constant 
(/(huin(t)) + /(temp(t))) concurrent effects, as well as corresponding concurrent 
interaction effects /(hum(t),temp(t)) and /(hum(t),temp(f), t), respectively. 
We also considered models with no effects of humidity and/or temperature at 
all. 

We estimated models for all 100 combinations (4 day effects, 5 auto-regressive 
effects, 5 humidity/temperature effects) of the different effect specifications given 
above. As in the main analysis, we excluded every third day from the training 
sample to serve as a validation set. Analysis of mean Brier scores achieved on 
the training data showed that all 100 models fit the training data similarly well, 
with slightly better fits for models with i. i. d. or AI?(1) functional random effects 
compared to models with no day effects or a smooth day effect. Analysis of mean 
Brier scores on the validation data, however, showed that only very small differ¬ 
ences in predictive accuracy between models with a smooth day effect and no day 
effects at all exist and also revealed a strong decrease in predictive performance 
for models with humidity-temperature effects, especially in combination with 
cumulative auto-regressive effects. All in all, the majority of models performed 
worse in terms of predictive Brier score than a simple functional intercept model 
yi(t) ~ B(60, logit -1 (/3o(£))), which had a mean predictive Brier score of 0.026, 
compared to a mean predictive Brier score of 0.029 for the i. i. d. functional 
random intercept model discussed below and a mean predictive Brier score of 
0.024 of a model like the one discussed in Section 3.3 with a smoothly varying 
day effect better suited for generating interpolating predictions for missing days 
instead of functional random day effects, which are constant 0 for days in the 
test set. We conclude that overfitting of complex effects could be a serious con¬ 
cern for the model class we propose, as shown by the large differences in Brier 
scores between training and validation data for many of the models with com¬ 
plicated effect structures. Note, however, that even though the i. i. d. random 
effect model yields inferior predictions, it may be better suited for estimating 
explanatory and interpretable models than the one discussed in Section 3.2, as 


Scheipl, Gertheiss, Greven/GFAMM 


B.36 


(i(t) and y(t) for selected days: training data 


1 

10 

19 

iULl 

28 

ililb 

1 iLii 

46 

m iliii 

56 

JfcMk 

65 

i MtL 

74 

UUk 

83 

UUlfJ 

92 

101 

PHii 


0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 0 6 12 18 24 


lift) and y(t) for selected days: validation data 



Fig B.2. Fitted (top) and predicted (bottom) values for pig 51 for selected days for an alter¬ 
native model specification with i. i. d. functional random day effects instead of a smooth aging 
effect. Black lines for fitted (top) and predicted (bottom) values jifit). red for observed feeding 
rate y,; ()) /GO. Numbers above each panel give the day. Vertical axis on -scale. 


the larger flexibility of the i. i. d. functional random day effects is presumably 
more effective at modeling the peaky and irregular temporal dependencies in 
the observations than a smoothly varying aging effect. 

We present detailed results for an alternative model specification with i. i. d. 
functional random day effects bi(t) instead of a smooth aging effect f(i , t) below. 
Figure B.2 shows fitted and observed values for 12 selected days from the training 
data (top) as well as predicted and observed values for 12 selected days from the 
validation data (bottom). It is easy to see that the model is able to reproduce 
many of the feeding episodes in the training data, in the sense that peaks in the 
estimated probability curves mostly line up well with observed spikes of y%{t). 
This model explains about 41% of the deviance and achieves a Brier score of 
about 0.019 taking the mean over all days and time-points. Prediction (lower 
panel) with this model is more challenging (Brier score: 0.029), and succeeds 
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Fig B.3. Estimated effects for an alternative model specification with i. i. d. functional random 
day effects instead of a smooth aging effect. Top row: Estimated functional intercept and 
functional random day effects. Bottom row: Estimated coefficient surface for cumulative auto¬ 
regressive effect. Intervals are ± 2 standard errors. Vertical axis for functional random day 
effects is truncated at —10 to show structure of typical results. 


only partially. For example, while the predictions for day 73 are mostly very 
good, predictions for the evening hours of day 11 or around noon on day 82 
are far off the mark. Note, however, that bi(t) = 0 for the validation data, as 
E{bi{t)) = 0Models that include a smooth day effect instead, which 

can be interpolated for the days in the calibration data, are more successful at 
prediction, see Section 3.3. 

Figure B.3 shows the estimated components of the additive predictor for this 
alternative model specification. The functional intercept in the top left panel 
of Figure B.3 is fairly similar in shape to Figure 2, but overall the base rate is 
estimated to be much higher and the peak before 6h is much less pronounced. 
Estimated random day effects are shown in the top right panel of Figure B.3. 
In terms of absolute size, these are much larger than the values of the smooth 
aging effect depicted in Figure 2. Effect sizes for bi(t) are often unrealistically 
large, with some estimates going as low as -20 (-20 on the logit scale corresponds 
to a reduction of the probability by a factor of about 2 • 10 -9 ), presumably an 
artifact of the very low feeding activity between midnight and early morning 
leading to (quasi-)complete separability of the data. The vertical axis in Figure 
B.3 is cut off at —10 in order to showcase the typical “peaky” structure of the 
bi(t) which would otherwise be hidden by the larger vertical axis required to 
show the 12 random intercept functions reaching minimal values below —10 
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in their entirety. The cumulative auto-regressive effect of feeding during the 
previous 3 hours displayed in the bottom row of Figure B.3 is quite similar 
to the estimate for the original model shown in Figure 2 but with a weaker 
positive association between feeding in the immediate past and current feeding 
and a stronger negative association for feeding episodes in the early morning 
hours. 
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Family 

Set 

SNR 

n 

Median rRIMSE(?}(t)) 

<725 

975 

Median coverage 

<725 

975 

Beta 

int 

1 

100 

0.06 

0.05 

0.06 

0.97 

0.95 

0.98 




300 

0.04 

0.04 

0.04 

0.97 

0.93 

0.98 



5 

100 

0.03 

0.03 

0.04 

0.96 

0.93 

0.98 




300 

0.02 

0.02 

0.02 

0.94 

0.92 

0.97 


smoo 

1 

100 

0.08 

0.07 

0.08 

0.98 

0.96 

0.99 




300 

0.05 

0.05 

0.06 

0.97 

0.95 

0.98 



5 

100 

0.04 

0.04 

0.05 

0.97 

0.96 

0.98 




300 

0.03 

0.03 

0.03 

0.96 

0.95 

0.97 


te 

1 

100 

0.10 

0.09 

0.11 

0.97 

0.96 

0.98 




300 

0.07 

0.06 

0.07 

0.96 

0.95 

0.97 



5 

100 

0.05 

0.05 

0.06 

0.95 

0.94 

0.96 




300 

0.04 

0.04 

0.04 

0.93 

0.91 

0.94 


fr 

1 

100 

0.07 

0.07 

0.08 

0.96 

0.94 

0.98 




300 

0.05 

0.05 

0.05 

0.96 

0.94 

0.97 



5 

100 

0.04 

0.04 

0.04 

0.96 

0.94 

0.97 




300 

0.03 

0.02 

0.03 

0.95 

0.93 

0.96 

NB 

int 

NA 

100 

0.05 

0.05 

0.06 

0.97 

0.95 

0.98 




300 

0.04 

0.03 

0.04 

0.95 

0.93 

0.97 


smoo 


100 

0.08 

0.07 

0.09 

0.97 

0.96 

0.98 




300 

0.05 

0.05 

0.06 

0.97 

0.96 

0.98 


te 


100 

0.14 

0.13 

0.15 

0.97 

0.96 

0.98 




300 

0.10 

0.09 

0.11 

0.97 

0.95 

0.98 


fr 


100 

0.08 

0.08 

0.09 

0.96 

0.94 

0.97 




300 

0.05 

0.05 

0.06 

0.95 

0.94 

0.97 

t(3) 

int 

1 

100 

0.07 

0.06 

0.08 

0.97 

0.95 

0.98 




300 

0.05 

0.04 

0.05 

0.97 

0.95 

0.98 



5 

100 

0.04 

0.03 

0.04 

0.95 

0.93 

0.97 




300 

0.02 

0.02 

0.03 

0.95 

0.92 

0.97 


smoo 

1 

100 

0.09 

0.08 

0.10 

0.98 

0.96 

0.99 




300 

0.06 

0.05 

0.06 

0.97 

0.96 

0.98 



5 

100 

0.05 

0.05 

0.05 

0.97 

0.96 

0.98 




300 

0.03 

0.03 

0.03 

0.97 

0.95 

0.97 


te 

1 

100 

0.12 

0.11 

0.12 

0.97 

0.96 

0.98 




300 

0.08 

0.07 

0.08 

0.96 

0.95 

0.97 



5 

100 

0.06 

0.06 

0.07 

0.95 

0.94 

0.97 




300 

0.04 

0.04 

0.04 

0.94 

0.93 

0.95 


ff 

1 

100 

0.09 

0.08 

0.09 

0.96 

0.94 

0.98 




300 

0.06 

0.05 

0.06 

0.96 

0.94 

0.97 



5 

100 

0.05 

0.04 

0.05 

0.96 

0.94 

0.97 




300 

0.03 

0.03 

0.03 

0.95 

0.93 

0.96 


Table 2 

Tabular display of results in Figure 5. Median and 25% and 75% quantiles for relative 
RIMSE and pointwise coverages for the additive predictor fj(t). 
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Family 

Set 

SNR 

n 

Median rRIMSE(/(A' r , t)) 

<725 

975 

Median coverage 

<725 

975 

Beta 

smoo 

1 

100 

0.29 

0.25 

0.33 

0.99 

0.97 

1.00 




300 

0.20 

0.18 

0.22 

0.99 

0.97 

1.00 



5 

100 

0.17 

0.16 

0.19 

0.99 

0.97 

1.00 




300 

0.12 

0.11 

0.13 

0.98 

0.96 

0.99 


te 

1 

100 

0.18 

0.16 

0.20 

0.94 

0.92 

0.96 




300 

0.11 

0.10 

0.12 

0.94 

0.92 

0.96 



5 

100 

0.13 

0.11 

0.15 

0.90 

0.88 

0.92 




300 

0.07 

0.07 

0.08 

0.89 

0.87 

0.91 


a 

1 

100 

0.32 

0.28 

0.36 

0.93 

0.87 

0.97 




300 

0.20 

0.18 

0.23 

0.94 

0.90 

0.99 



5 

100 

0.17 

0.15 

0.19 

0.95 

0.90 

0.99 




300 

0.11 

0.10 

0.12 

0.94 

0.91 

0.98 

NB 

smoo 

NA 

100 

0.31 

0.27 

0.35 

0.99 

0.97 

1.00 




300 

0.21 

0.18 

0.22 

0.99 

0.97 

1.00 


te 


100 

0.23 

0.22 

0.26 

0.96 

0.93 

0.97 




300 

0.16 

0.14 

0.17 

0.96 

0.94 

0.97 


a 


100 

0.35 

0.31 

0.40 

0.93 

0.85 

0.97 




300 

0.22 

0.19 

0.24 

0.94 

0.89 

0.97 

t(3) 

smoo 

1 

100 

0.32 

0.29 

0.36 

1.00 

0.98 

1.00 




300 

0.22 

0.20 

0.25 

0.99 

0.97 

1.00 



5 

100 

0.19 

0.18 

0.21 

0.99 

0.97 

0.99 




300 

0.13 

0.12 

0.14 

0.98 

0.96 

0.99 


te 

1 

100 

0.20 

0.18 

0.22 

0.95 

0.93 

0.97 




300 

0.12 

0.11 

0.13 

0.95 

0.94 

0.97 



5 

100 

0.14 

0.12 

0.15 

0.91 

0.89 

0.93 




300 

0.08 

0.07 

0.09 

0.90 

0.88 

0.92 


a 

1 

100 

0.36 

0.32 

0.41 

0.93 

0.87 

0.98 




300 

0.24 

0.20 

0.26 

0.94 

0.89 

0.98 



5 

100 

0.19 

0.17 

0.22 

0.94 

0.90 

0.99 




300 

0.13 

0.11 

0.14 

0.94 

0.90 

0.97 


Table 3 

Tabular display of results in Figure 6. Median and 25% and 75% quantiles for relative 
RIMSE and pointwise coverages for the estimated effects f(X r ,t). 







